Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

PCNN: Projection Convolutional Neural Networks

as ω and ˆx are bilinear with each other as ω ◦ˆx^[^k^]. In our discrete optimization framework,

the discrete values of convolutional kernels are updated according to their gradients. Taking

Eq. 3.36 into consideration, we derive the update rule for ˆx^[^k^+1]as

ˆx^[^k^+1]= ˆx^[^k^]−η ^∂f⁽^ω,^ˆ^x^[^k^]⁾

∂ˆx^[^k^]

= ˆx^[^k^]−ω ◦ηδ^[^k^]

ˆx ^.

(3.37)

By plugging Eq. 3.37 into Eq. 3.35, we achieve a new objective function or a loss function

that minimizes

||ˆx^[^k^+1]−ω ◦x||,

(3.38)

to approximate

ˆx = ω ◦x, x = ω⁻¹◦ˆx.

(3.39)

We further discuss multiple projections, based on Eq. 3.39 and projection loss in (3.34),

and have

min ¹

||x −ω⁻¹

◦ˆxj||².

(3.40)

We set g(x) = ¹

j ^||^x⁻^ω⁻¹

◦ˆxj||²and calculate its derivative as g^′(x) = 0, and we have

x = ¹

ω⁻¹

◦ˆxj,

(3.41)

which shows that multiple projections can better reconstruct the full kernels based on

binaries counterparts.

3.5.4

Projection Convolutional Neural Networks

PCNNs, shown in Fig. 3.12, work using DBPP for model quantization. We accomplish this

by reformulating our projection loss shown in (3.34) into the deep learning paradigm as

Lp = ^λ

L,I

l,i

|| ^ˆC^l,^[^k^]

i,j

−

W ^l,^[^k^]

◦(C^l,^[^k^]

+ ηδ ˆ

C^l,^[^k^]

i,j ⁾^||²^,

(3.42)

where C^l,^[^k^]

, l ∈{1, ..., L}, i ∈{1, ..., I} denotes the ith kernel tensor of the lth convolutional

layer in the kth iteration. ^ˆC^l,^[^k^]

i,j

is the quantized kernel of C^l,^[^k^]

via projection P ^l,j

Ω ^{, j}^∈

{1, ..., J} as

ˆC^l,^[^k^]

i,j

= P ^l,j

Ω ⁽

W ^l,^[^k^]

, C^l,^[^k^]

(3.43)

where

W ^l,^[^k^]

is a tensor, calculated by duplicating a learned projection matrix W ^l,^[^k^]

along

the channels, which thus ﬁts the dimension of C^l,^[^k^]

. δ ˆ

C^l,^[^k^]

i,j

is the gradient at ^ˆC^l,^[^k^]

i,j

calculated

based on LS, that is, δ ˆ

C^l,^[^k^]

i,j

∂LS

∂^ˆ

C^l,^[^k^]

i,j ^{. The iteration index [}^k^{] is omitted for simplicity.}

In PCNNs, both the cross-entropy loss and projection loss are used to build the total

loss as

L = LS + LP .

(3.44)

The proposed projection loss regularizes the continuous values converging onto Ω^Nwhile

minimizing the cross-entropy loss, illustrated in Fig. 4.15 and Fig. 3.25.